Search CORE

204 research outputs found

Manipulating natural images by learning relationships between visual domains

Author: Usman Ben
Publication venue
Publication date: 14/11/2022
Field of study

Manipulation of visual attributes of real images is a fundamental generative computer vision task. The goal is to alter specified visual attributes of a given input image while preserving all other visual attributes. The manipulations can be global, such as changes in lighting or view angle, or spatially localized, such as the addition or removal of individual objects or actors, changes to their appearance, pose, or expression. The majority of existing attribute manipulation methods are either hand-crafted for a very specific manipulation (e.g. Photoshop filters) or require a large dataset with attribute annotations to learn the desired manipulation in a supervised fashion. This requirement renders fully-supervised methods prohibitively expensive to apply in many real application domains that do not have large densely annotated datasets. In this thesis, we investigate whether flexible attribute manipulation models can be trained without massive labeled datasets of real images by transferring knowledge about the desired manipulation across different image datasets (domains) that share the underlying structure. This transfer is often performed by transforming examples from one domain in a way that makes them indistinguishable from the other for a given family of neural discriminators. This procedure is called unsupervised adversarial image alignment, and in this thesis, we show that it suffers from training instability, and introduce two new approaches for the stabilization of this alignment: objective dualization and likelihood-ratio minimizing flows. After that, we propose a novel setup and a method for manipulation of natural images that uses only cross-domain supervision. Finally, we propose a new method for the manipulation of domain-specific and domain-invariant factors of variation in the absence of any supervision in either domain. We show that the proposed cross-domain alignment objectives yield more stable solutions and that the proposed cross-domain image manipulation techniques successfully learn correspondences between factors of variation present across different visual domains

Boston University Institutional Repository (OpenBU)

Towards Practical Non-Adversarial Distribution Alignment via Variational Bounds

Author: Gong Ziyu
Inouye David I.
Usman Ben
Zhao Han
Publication venue
Publication date: 30/10/2023
Field of study

Distribution alignment can be used to learn invariant representations with applications in fairness and robustness. Most prior works resort to adversarial alignment methods but the resulting minimax problems are unstable and challenging to optimize. Non-adversarial likelihood-based approaches either require model invertibility, impose constraints on the latent prior, or lack a generic framework for alignment. To overcome these limitations, we propose a non-adversarial VAE-based alignment method that can be applied to any model pipeline. We develop a set of alignment upper bounds (including a noisy bound) that have VAE-like objectives but with a different perspective. We carefully compare our method to prior VAE-based alignment approaches both theoretically and empirically. Finally, we demonstrate that our novel alignment losses can replace adversarial losses in standard invariant representation learning pipelines without modifying the original architectures -- thereby significantly broadening the applicability of non-adversarial alignment methods

arXiv.org e-Print Archive